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Abstract 


Facial recognition methods were first explored in security systems to identify and compare 
human faces and is far superior compared to biometric and iris recognition, this technique 
has been implemented in iris recognition, image detection, etc. Recently these methods have 
been explored in other fields of study and have become a commercial identification and 
marketing tool. This paper describes the different algorithms of facial recognition and 
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yielded the maximum accuracy. 
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1. Introduction 


A human face is a unique characteristic that differs from person to person , therefore face recognition becomes a credible 
source of identification apart from fingerprint scanners (Rodavia e/ al., 2017). Face recognition is popular and widely 
used for personnel identification. The automatic facial recognition system involves the application of an intelligent 
artificial system to recognize the human faces under any circumstances. Today the study of facial recognition has 
involved a keen interest in pattern recognition, computer vision and other related fields. Camera is the only device for 
face recognition system. Face recognition provides an inexpensive and reliable personal identification which is applicable 
in many fields (Phankokkruad and Jaturawat, 2017). Itis cheaper than biometric form of identification and can be used 
anywhere with low budget costs. 


Recognition accuracy is an important factor in facial recognition system. However, there are many factors that affect 
the recognition accuracy. Environmental factors, quality of image, shifting and scaling of images are the common factors 
that affect the recognition accuracy. Sometimes these factors makes an image non ideal for recognition with decreased 
accuracy (Phankokkruad and Jaturawat, 2017). Other factors that affect the accuracy are face shape, texture, specs, hair, 
illumination etc. There are several external uncontrollable factors that affect the accuracy of the image recognition 
system. However there is publication of face recognition algorithms which reveals that each algorithm has certain 
characteristics and provides good accuracy in different aspects. 
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This research studies about some well know face recognition algorithms and makes a comparison of their recognition 
accuracies both on train and test set. Eigen faces, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and 
Convolutional Neural Network (CNN) are chosen in this experimental study. A variation of face viewpoints is the factor 
that has been used in the experiment to study the effect of recognition accuracy. In this way the advantages and 
disadvantages of different algorithms can be studied. Consequently it will help the developers to choose the best facial 
recognition algorithm in their field of implementation. 
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Figure 1: Flow-Chart 


2. Literature review 


2.1. Eigenfaces algorithm with PCA 


The eigenfaces algorithm is the most commonly used methods in the field of facial recognition (Carik¢i and Ozen, 2012) 
have researched on eigenfaces method with smallest Euclidian distance is the person that resembles the most. We split 


the set of images into a training and test set. The face images in training set are converted into to face vectors I’, . The 
face vectors are then standardized by computing the mean of the face vectors then subtract the mean face from each of 


the face vectors to get as normalized face vector ®, (Paul e7 al., 2018). 
®, =I; -y 
where y = 1/M X_(i=1)MT,, y is the average face vector. 
The covariance matrix Cis given by 
C= l/m2_(n=1)*M® = AAT (N2 x N2 matrix) 
where A=[@,, @,, .... ®M ], each normalized vectors in each column makes up A, where A is 


N2 x M matrix Paul et al. (2018). Now choosing k significant vectors from the eigenvector space becomes a problem. 
Therefore, we use Principal Component Analysis (PCA), where we assume face vector space as a lower dimensional 
subspace and recomputed the covariance matrix as C= ATA, the covariance matrix becomes of M x M dimension. Similar 
methods have been further researched down by Chen and Jenkins (2017). Now it becomes easier to find the k significant 
Eigenvectors from the face vector space. The covariance matrix returns M Eigenvectors each of M x 1 dimension. After 
PCA, k best eigenfaces are selected that explains maximum variance such that k < M which represents the total training 
set. Now the selected k eigenfaces must be of the original dimensionality of the face vector space so we have to map it 
back into the original dimensionality by u,= A * v,, where u, the Eigenvector of higher dimensionality and v, is the 
Eigenvector of lower dimensionality. Detsing and Ketcham (2017) also used PCA with eigenfaces in their research. Paul 
et al. (2018) in their review have found that PCA with eigenfaces is most commonly used to extract distinct features with 
a face from person to person. The dimensionality reduction not only made the computational problem easier but also 
helped in the reduction of noise which could have impacted our result. Now we represent each face in the training set as 
a linear combination of all & eigenvectors along with the average face vector multiplied by the weights w. The weight 
vector is represented as Q =[wl, w2, ..., wk], this is the eigenface representation of the ith face and the weights for each 
face is calculated. Chakrabarti and Dutta (2013) also used PCA with eigenfaces for their research of facial expressions. 
The flowchart of our eigenfaces algorithm is mentioned (Figure 2). 
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Figure 2: Flowchart for eigenfaces algorithm 


3. Haar-Cascade Classifiers Viola Jones 


Object detection is a method for identifying the existence of an object of a certain class Soo (2014) by a method of image 
processing. Objects can be classified based on texture, color and shape. One can a employ color coded approach to 
identify a object, color-coded approach has its downside as lighting conditions play’s very important role in detecting 
the object. Object detection based on features, the shape, etc. has been employed to overcome the prior approach. 


Viola Jones is based on 
¢ Integral image 
¢ Haar feature selection 
¢ Learning classifier with Ada-Boost 
* Cascade structure 


Haar-Like features considers rectangular regions in detection windows, which thereby sums up pixel values and 
calculates the difference of these sums which is then used to categorize sub sections of an image. In facial features, the 
area under eyes is darker than the area near cheeks. A rectangular target size is moved over the input images that 
calculates the haar feature , which is then compared with the learning threshold that differentiates an object from non- 
object. 

Feature extraction is based on three approaches: 

* Holistic 
¢ Feature Based 
¢ Hybrid 
The implementation of the open-CV library has led a generation of object detection classifier which uses Haar-Like 


over image to detect features of a human face like, two eyes, nose, mouth, ears. 


Viola-Jones have used summed-area tables called as integral images which is a 2-Dimensional lookup table a “form 
of matrix table” which is equal to original image matrix, it allows summation of rectangular areas in particular position 
using 4 look ups 

Sum = I(C) + 1(A)— I(B)- I/D) where A, B, C, D are part of Integral Image I 


A 5 


D c 


Viola Jones use adaptive boosting to select some features of the face to train the classifier, Ada-Boost is a weighted 
sum of weak classifiers. 
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Learning Algorithm (x) = De ay * hj(x) 


Weak classifier 


boon] ae 


sj, otherwise 


4. Support Vector Machine (SVM) 


SVM is a supervised learning model with associated learning algorithms (Saitta, 1995), it analyses data for classification 
and regression analysis. SVM training algorithm differentiates categories making it non probabilistic binary linear 
classifier. SVM separates categories by a clear gap as wide as possible also known as margin. SVM also performs 
nonlinear classification using Kernel trick. SVM is capable of delivering higher classification accuracy (Srivastava and 
Bhambhu, 2010). SVM can be used to detect text, digit, image classification and object detection. 


SVM constructs hyper-plane between two or more clusters; hyper-plane can be used to detect outliers among data. 
To achieve optimal parameter setting SVM requires extensive cross validation commonly known as model selection. The 
choice of a kernel function, the standard deviation of the Gaussian kernel, training data, relative weights of slack variable 
impacts the overall results. SVM minimizes the empirical classification error and maximizes the geometric margin. SVM is 
based on Structural Risk Minimization (SRM). SVM maps input vector to a higher dimensional space with maximal 


separating hyper plane. 
{(x1,y1),(42,y2),(x3,y3), (14,4). ... ee (an, yn)}. 
where yn = | or -1 
w.x+b=0 ..(1) 
where b is a scalar and wis a p-dimensional vector. 
For linear separable data, hyper plane is selected to differentiate datasets 
wxi-b>lorwxi-b<-l 
the distance between the hyper plane is (2 / |w|). Here |w| can be minimized by 
wxi-b>lorwxi-b< -1 
which can be arranged as 
yi(wxi-b)>1,1<i<n ...(2) 
Hyper plane with the largest margin is defined by M= 2/|w| 


yj[w'.x, +b]-1,i=1 ...(3) 
Hyperplane with maximum margin is known as optimal canonical hyperplane 


y lw" x; +b]>1,i=1,2... 1 (4) 


>, GILP = Low, b,@) = sIwi2- Y ai(yi(w'xi+b)—1) 


(5 
= 5 uP w~ Lai yi(w" xi +b)-2) i) 
6L/ 6wO=011.e W= 2X ai yi xi ...(6) 
And 0L/éb0=01i.ex ai yi=0 wD 


Substituting 6-7 in Equation (5) 
An optimization problem is solved by Lagrange’s Function 


It is necessary to optimize saddle point s (w0, b0, a0) Langranges multiplier should be minimized wrt v, b and has to 
be maximized wrt nonnegative ai (ai = 0) which can be solved bya primal or dual form 
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ly : : 2 
The functional equation is oe max(0,1-Yi(w~ xi” —b)) + Alw| 


5. Convolutional Neural Networks (CNN) 


Convolutional networks models LeCun and Bengio (1998) is a biologically inspired models have been utilized in facial 
recognition hand written numeral recognition have good success in a real-time image and video recognition it provides 
better accuracy than other recognition techniques CNN utilize smaller receptive windows size. The training and testing 
happens densely over the whole image and over multiple scales. Recognition is done in real time with multiple images per 
person for multiple persons, we did not consider invariance toa high degree of rotation our aim was for rapid classification 
of images, we have used 25 persons multiple images taken at different times, with facial expression and with/without 
glasses. CNN is a feed-forwarding Artificial Neural Network applied for analyzing images, it’s multilayer perceptron 
which doesn’t require data pre-processing (Yan LeCun e al., 1998), also known as Shift-Invariant Artificial Neural 
Networks (SIANN) due to their weights and transitional invariance. Derived from human brain Matsugu et al. (2003) 
neurons major advantage of CNN being no effort and independence from prior knowledge. CNN consist of: 


a. Input layer 
b. Output layer 
c. Multiple hidden layers 


Convolutional Layer passes results to next layer by applying convolutional operations to input image. The convolution 
emulates response of individual neuron to visual stimuli. The receptive field of neurons are used to process data, the 
fully connected feed forward neural networks is a better classifier because it learns the features of data (Ng e¢ al., 2015). 
The convolutional reduces free parameters and gives better and deeper results (Heravi ef al., 2015). CNN resolves 
vanishing or exploding gradient problem with back-propagation. 


convolutional 


Fitters CNN Pooling 


connected 
and pooling 


Figure 3: Flow chart for CNN 


Pooling layer: CNN may include local /global pooling layers combines output of neuron cluster to single neuron layer. 


Flattening: All the values from different cells are stacked in one vector which becomes a feature to ANN. High numbers 
in the vector refers to specific features in the input image which represents the distinct feature of a face. 


Fully connected layer: It connects every neuron in one layer to all the neurons of other layers. 


6. KNN - K Nearest Neighbor 


KNN is the most practical/non-parametric approach for facial recognition (Altman, 1992) based on features such as eyes, 
nose, eyebrows, mouth, ears within the source image. It achieves its robustness by normalizing the size and orientation 
of face (Ebrahimpour and Kouzani, 1996) KNN classify images in lesser time and with better accuracy faster execution 
time in KNN dominates SVM and other classifier algorithms. KNN classifier is an extension of simple nearest neighbor 
which employs non-parametric decision of query image based on the distance of its features from other image features. 
The distance between features can be measured through city-block distance, Euclidean distance or cosine distance. 


n=, 
«d(x, y) = ae | Xi — Yil City-Block Distance 


n=l 1/2 
ed(x,y)= poe |Xi -Y¥i i] Euclidean Distance 


od cos(x, y) = 7 + iy Cosine Distance 
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KNN uses a query on closest samples. KNN classifier value depends on the number of samples used and their 
topological distribution KNN is among the simplest machine learning algorithm (Kaur, 2012). 


If K = | then the image is simply assigned to the nearest neighbor class. 
Every data pixel in the dataset has a class label in set Class= {C1, C2, C3....., Cn}. 
Through distance matrix, k-closest neighbors are found. 


K-closest data points are analyzed to determine a common class label among the set. 


Aw Nm 


The most common class label is assigned to data points. 


Parameters selection depends on data larger values of k reduce noise in classification. A good k is selected through 
heuristic techniques. Presence of noise (irrelevant features) in dataset reduces the accuracy or if the feature scale is 
inconsistent. Evolutionary algorithms are employed to reduce noise and optimize feature selection. 
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Figure 4: Block diagram representation for KNN model 


Implementation 


The system architecture diagram of the study describes that a camera will start detect human faces using Haar Cascade 
algorithm. Given the circumstances that human faces detected, those images are passed into Face Classification Module 
and then we compare the recognition accuracies of SVM, KNN and CNN. 


Face Classification Module 


¢ Pre-Processing: Each image is read from a database and converted into matrix of dimension with respect to the 
image. The images are then standardized and ultimately divided into train and test set in the ration of 0.2. 


¢ PCA: PCA is then applied to both train and test in order to extract the distinct features of all the images. Then the 
eigenfaces are computed from the pca components that explained 95% of the variance. 


¢ Classification Module: The matrix thus obtained from PCA is fed into machine learning module for training. 

¢ Comparison module: We then compared the recognition accuracies of SVM, KNN and CNN both on train and test 
set and found CNN yielded the maximum accuracy. 

7. Results 


¢ Face Detection: Human face is detected using Haar Cascade algorithm. 


Figure 5: Human face detection 
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Image Matrix: 


Image Matrix of all images: 


[se BB Bas BH 2 By 
[ 22 20 26... 15 22 20] 
[ 26 28 28... 23 25 32] 
[163 16@ 149 ... 106 123 123] 
[197 158 1935 ... 6&9 2 75] 
[205 208 207... 72 79 79]] 


Figure 6: Sample image matrix 


Total variance explained: 


The explained variance tells you how much information (variance) can be attributed to each of the principal components. 
This is important as while you can convert 4-dimensional space to 2-dimensional space, you lose some of the information. 
By applying explained variance ratio, one can ought to have first principal component capturing maximum variance and 
the second principal component contains second maximum variance of the total variance. 


Explained Variance 


[6348.42173806 3016.2@553875 2617.99889663 1602.06073802 1372.54819737 
817.26359991 694.69486815 60@.35792127 442.81973497 379.3156861 
358.40688644 300.85874822 232.26132967] 


Figure 7: Variance explained by PCA components 


PCA Components: 


Out of all components we extracted only top 13 components 


PCA Components 


[[-8.16282798e-@3 -8.@54946@7e-@3 -8.@5187145e-@3 ... 2.96807363e-@3 
3.26171569e-@3 3.52898352e-@3] 
[ 4.40478785e-03 5.43701951e-@3 6.073635@7e-@3 ... 1.22678933e-02 


1.2083@261le-@2 1.16625530@e-@2] 

[-1.58958359e-@35 -1.94653432e-@3 -1.82719873e-@3 ... -6.13874926e-03 
-5.81416384e-03 -5.50118821e-03] 

[ 2.07361415e-@4 1.63484665e-@3 1.17701930e-@3 ... 3.07268402e-@5 
5.38020918e-@4 8.57876205e-04] 

[ 1.51559138e-@2 1.2742557le-@2 1.20661614e-@2 ... -2.18416193e-@3 
2.053588935e-03 3.41778497e-@3] 

[-3.81934868e-03 -4.78936593e-@3 -5.73630761e-@3 ... -8.84821877e-@3 
-3.84752512e-@3 -1.63565478e-@3]] 


Figure 8: Components that explained 95% of variance 
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Transformed Train Matrix 
Transformed Train Matrix 
[[-7.781337@6e+@0 7.8@735176e+@1 
3.46629456e1+01 -5.30487816e+61 
-2.81771889e+0@1 -1.406225@4e+01 
-1.57645720e+01] 


[-9.41493988e+00 8.41856001e+01 
3.34665415e+@1 1.24509664e+01 
2,.30434071e+01l 2.80776949e+01 

-9.41146360e+880 ] 

[ 1.92964518e+01 -3.36820529e+00 

6.51157375e+@1 4.64269559e+01 
-3.60374768e+@1 -3.23571273e+@1 

1. 7@@37847e+01] 
[-8.32645528e+01 -1.93718238e+01 
-3.65643469e+81 5.57172196e+00 
-6.137016019e+00@ -1.65929666e+01 
-4.78757363e-61] 


Figure 9: Sample train set matrix after applying PCA 


Transformed test matrix 


Transformed Test Matrix 


{LE 9-5@62@722e+01 -8.63918797e+00 
~-1.3663125@e+@1 -3.36851579e+0e 
3.91031022e+00 -1.87248264e+00 
1.@3406153e-01] 

[ 9.99993558e+@6 -3.58761375e+61 
6.67478993er@l 5.39771086e+01 

-2.480112734e+@1l 8.88884355e+00 
-3 .487534272e+00] 

[ 1.71@34299e+@@ -3.63774249e+61 
4.65064756e+@1l 1.314504@05e+01 
5.88821525e+@1l 1.79757587e+008 
9.75692224e-61] 

[-1.54956729e+@1 9.63687594e+01 

-6.36848454e+00 1.42536409e+01 
1.27294989e+81l 1.80184878e+01 
2.09917134e+01] 


[ 1.@8836986e+@02 -7.83184293e+01 
-2.75769115e+@1l -2.67246342e+@1 
-2.24826182e+@08 1.99543642e+01 
-4.052962966+80 ] 


[-7.87247@@8e+@0 -3.16742003e+00 
2.5269362@e+@1 2.14537726e+01 
2.61706375e+@1 -7.72155223e-@1 
2.65649897e+20] J 


Figure 10: Sample test set matrix after applying PCA 


KNN 

¢ Train set size: 27 

¢ Test set size: 6 

¢ Train set accuracy: 1.0 


¢ Test set accuracy: 0.66 


-4.72368017e+01 
-1.26615559e+008 
-1.0859813@e+01 


-3 .46192184e+@1 
-9.91671699e+00 
5.624656591e+@1 


1.14586416e+61 
-2.45587563e+@1 
5.76736716e+00 


-8.22711325e+@1 
-4.67625536e+@1 
-6.33235426e+00 


3 .48243795e+61 
4.649891865e+01 
1.50271660e+01 


6.959804959e+88 
2.11007235e+@1 
-1.53312362e+@1 


-2.52169592e+61 
6 .@5874530e+808 
-1.38999123e+01 


3 .92884419e+61 
-1.003496765e+@01 
-2.87@51538e+808 


5 .61326454e+61 
-2.43439969e+01 
-1.6909883@e+01 


8.76921517e+e0 
-3.31929777e+@1 
2.86753588e+88 
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-3.2170@722e+01 
5.16964679e+008 
4.19495726e+01 


-3.41696115e+61 
2.94785307e+01 
-9.52507939e+00 


6.108599235e+08 
2.12939211e+061 
-9.87997253e+00 


1.72027005e+01 
2.69994464e+61 
-1.2548@5684e+01 


7 -64189596e+61 
1.55148579e+@1 


-5.75176944e+00 


7 .86202195e+808 


-5.75675793e+00 


1.88794543e-@1 


1.59254138e+61 


~2.82819093e+01 


1.71464343e+01 


9.58532693e+80 


-3.09213464e+00 


1.84016763e+01 


-2.66226144e+61 
5 .34538646e+01 
-8 .42127930e+008 


3.64989477e+@1 
5 .@2800364e+80 
6.76895336e+88 
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precision recall fi-score 
8 6.08 6.08 8.08 
HE 6.08 8.88 6.08 
2 1.06 1.66 1.00 
3 1.00 8.58 6.67 
4 1.06 1.06 1.88 
6 @.68 6.80 8.08 
7 1.06 1.06 1.00 
avg / total 6.83 6.67 6.72 
Figure 11: Classification matrix for KNN 
SVM 
¢ Train set size: 27 
¢ Test set size: 6 
¢ Train set accuracy: 1.0 
¢ Test set accuracy: 0.83 
precision recall fi1-score 
8 @.88 8.88 6.88 
1 8.86 4.88 4.88 
2 1.06 1.06 1.60 
3 1.00 1.66 1.66 
S 1.00 1.00 1.60 
7 1.06 1.66 1.66 
avg / total @.83 @.83 @.83 
Figure 12: Classification matrix for SVM 
Prediction using SVM 
predicted: andrew predicted: erin predicted: andy predicted: jack 
true: andrew true: erin true: _andy true: i jack 


predicted: amy predicted: andy 
true: amber true: andy 
— 


Figure 13: Prediction of images in test set 
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Figure 14: Eigenfaces 


CNN 

¢ Train set size: 162 

* Test set size: 59 

¢ Train set accuracy: 1.0 
¢ Test set accuracy: 0.89 
¢ No. of epochs: 15 


Therefore we can conclude that CNN as the best model out of other models being used because it yielded the 
maximum accuracy. 
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Epoch 12/15 

162/162 [==ssssssssssssssssssssesssess5 ] - 145s 894ms/step - loss: 6.0069 - acc: 1.6000 
- wal_loss: @.1174 - val_acc: @.8947 

Epoch 13/15 


162/162 [======eeseeee=esseseseesss=e==] - 1445 891ms/step - loss: 6.0065 - acc: 1.6000 
- val_loss: @.1075 - val_acc: 6.8993 

Epoch 14/15 

162/162 [====s=s==ssss=ss=s=ssssss==sss=s====] - 1455 897ms/step - loss: 6.0062 - acc: 1.0000 


- val_loss: @.1177 - val_acc: @.9005 

Epoch 15/15 

162/162 [===sssssssssssssssssss==s======| - 1635 Is/step - loss: 9.0858 - acc: 1.0000 - 
val_loss: @.1221 - val_acc: @.8970 


Figure 15: CNN output 


hoon ee 


sj, otherwise 


Conclusion 


This paper we have shown comparative results of three different methods of facial recognition. This study detects the 
human faces using Haar-Cascade algorithm, global features are being extracted using PCA. We have seen that KNN has 
the lowest recognition accuracy and CNN yielded the best validation accuracy. There are two factors that affect the 
accuracy of the system: facial recognition and face viewpoints. The challenge is that all the images have to be in same 
dimension and of same color depth otherwise the feature extraction will be inconsistent. The biggest challenge for CNN 
is the number of images in order to achieve a considerable amount of recognition accuracy in CNN the number of training 
images should be large. 


This above research will help developers to choose the best algorithm for facial recognition which can be implement 
in security systems, retail stores and many other applicable areas. In future efforts can be made to test on large set of 
images in order to improve the accuracy of CNN. Also efforts can be made to study other machine learning classification 
algorithms and combine some of them to build a complex system so that they could have larger recognition accuracy and 
can deal with large amount of data. 
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